BSTS Model is a Structural Time Series Model.
Observation Equation: \[y_t = Z_t^T \alpha_t + \epsilon_t \ \ldots \ \epsilon_t \sim N(0, H_t)\] Transition Equation: \[\alpha_t = T_t \alpha_t + R_t \eta_t \ \ldots \ \eta_t \sim N(0, Q_t)\] \(Z_t\), \(T_t\) and \(R_t\)are model matrices - having a mix of known values and unknown estimable parameters.
The basic structural model we make use of is as follows
\[ log(y_t) = \mu_t + \tau_t + \beta^T x + \epsilon_t\] \[ \mu_t = \mu_{t-1} + \delta_{t-1} + \nu_t \] \[ \delta_t = D + \phi (\delta_{t-1} - D) + \eta_t \] \[ \tau_t = \sum_{s=1}^{P} \tau_{t-s} + \omega_t\] Current level of the trend is \(\mu_t\), curent slope is \(\delta_t\). The seasonal component is \(\tau_t\), where P is the number of seasons in a year. P is strictly integral here. The semi-local linear trend model is similar to the local linear trend, but more useful for longterm forecasting. It assumes the level component moves according to a random walk, but the slope component moves according to an \(AR_1\) process centered on a potentially nonzero value D.
TBATS is a State space time series model.
\[y_t^{(\omega)} = Box- Cox (y_t, \omega)\] \[ y_t^{(\omega)} = l_{t-1} + \phi \ b_{t-1} + s_t + d_t\] \[ l_t = l_{t-1} + \phi \delta + \alpha \ d_t\] \[ b_t = (1-\phi) B + \phi \ b_t + \beta \ d_t \] Seasonal component involves a trigonometric representation using Fourier series.
\[ d_t = \sum_{i=1}^{p} \phi_i d_{t-i} + \sum_{i=i}^{q} \theta_i \epsilon_{t-i} + \epsilon_t \]
Here, \(l_t\) is the local level, B is the long run trend, \(b_t\) is the local slope. \(d_t\) denotes an \(ARMA\ (p,q)\) process. \(\epsilon_t\) is white noise. The others are tuning parameters.
We will do an analysis of the current forecasting model for each quarter.
| Training Data Ends | Test Data Starts | Test Data Ends | Number of Stores | |
|---|---|---|---|---|
| Quarter 1 | 11748 | 11801 | 11813 | 1026 |
| Quarter 2 | 11709 | 11714 | 11726 | 585 |
| Quarter 3 | 11722 | 11727 | 11739 | 536 |
| Quarter 4 | 11735 | 11740 | 11713 | 828 |
L: Easter, C: Thanksgiving, R: Christmas
We can use the driver variables identified in Phase 2 to build a regression model:
\[Scaling\ Factor = f(x_1, x_2, \ldots,x_p)\]
\(x_1, \ldots, x_p\) are the identified driver variables. Multicollinearity is a big problem in this approach, so we abandon this method.
We will judge the efficacy of implementing Phase 4.